Data Mining and Model Simplicity: A Case Study in Diagnosis

نویسندگان

  • Gregory M. Provan
  • Moninder Singh
چکیده

We describe the results of performing data mining on a challenging medical diagnosis domain, acute abdominal pain. This domain is well known to be difficult, yielding little more than 60% predictive accuracy for most human and machine diagnosticians. Moreover, many researchers argue that one of the simplest approaches, the naive Bayesian classifier, is optimal. By comparing the performance of the naive Bayesian classifier to its more general cousin, the Bayesian network classifter, and to selective Bayesian classifiers with just 10% of the total attributes, we show that the simplest models perform at least as well as the more complex models. We argue that simple models like the selective naive Bayesian classifier will perform as well as more complicated models for similarly complex domains with relatively small data sets, thereby calling into question the extra expense necessary to induce more complex models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Three Decision-Making Models in Differentiating Five Types of Heart Disease: A Case Study in Ghaem Sub-Specialty Hospital

Introduction: cardiovascular diseases are becoming the main cause of mortality and morbidity in most countries. This research goal was to predict the types of heart diseases for more accurate diagnosis by data mining and neural network technics. Method: This research was an applied-survey study and after data preprocessing, three approaches of neural network, decision making tree and Bayes simp...

متن کامل

Comparison of Three Decision-Making Models in Differentiating Five Types of Heart Disease: A Case Study in Ghaem Sub-Specialty Hospital

Introduction: cardiovascular diseases are becoming the main cause of mortality and morbidity in most countries. This research goal was to predict the types of heart diseases for more accurate diagnosis by data mining and neural network technics. Method: This research was an applied-survey study and after data preprocessing, three approaches of neural network, decision making tree and Bayes simp...

متن کامل

A Case Study of the Impact of Parental Diseases on the Probability of Hypertension Using Data Mining Techniques

Introduction: Hypertension is one of the most common health problems. As it has a major impact on other serious diseases such as cardiovascular diseases and strokes, and due to not having any specific symptoms, it is known as a silent killer. Therefore, proper diagnosis, control, and treatment of hypertension is crucial in health care systems and will indeed prevent the development of the other...

متن کامل

A Case Study of the Impact of Parental Diseases on the Probability of Hypertension Using Data Mining Techniques

Introduction: Hypertension is one of the most common health problems. As it has a major impact on other serious diseases such as cardiovascular diseases and strokes, and due to not having any specific symptoms, it is known as a silent killer. Therefore, proper diagnosis, control, and treatment of hypertension is crucial in health care systems and will indeed prevent the development of the other...

متن کامل

Approximate resistivity and susceptibility mapping from airborne electromagnetic and magnetic data, a case study for a geologically plausible porphyry copper unit in Iran

This paper describes the application of approximate methods to invert airborne magnetic data as well as helicopter-borne frequency domain electromagnetic data in order to retrieve a joint model of magnetic susceptibility and electrical resistivity. The study area located in Semnan province of Iran consists of an arc-shaped porphyry andesite covered by sedimentary units which may have potential ...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996